69 research outputs found
Reinforcement Learning with Perturbed Rewards
Recent studies have shown that reinforcement learning (RL) models are
vulnerable in various noisy scenarios. For instance, the observed reward
channel is often subject to noise in practice (e.g., when rewards are collected
through sensors), and is therefore not credible. In addition, for applications
such as robotics, a deep reinforcement learning (DRL) algorithm can be
manipulated to produce arbitrary errors by receiving corrupted rewards. In this
paper, we consider noisy RL problems with perturbed rewards, which can be
approximated with a confusion matrix. We develop a robust RL framework that
enables agents to learn in noisy environments where only perturbed rewards are
observed. Our solution framework builds on existing RL/DRL algorithms and
firstly addresses the biased noisy reward setting without any assumptions on
the true distribution (e.g., zero-mean Gaussian noise as made in previous
works). The core ideas of our solution include estimating a reward confusion
matrix and defining a set of unbiased surrogate rewards. We prove the
convergence and sample complexity of our approach. Extensive experiments on
different DRL platforms show that trained policies based on our estimated
surrogate reward can achieve higher expected rewards, and converge faster than
existing baselines. For instance, the state-of-the-art PPO algorithm is able to
obtain 84.6% and 80.8% improvements on average score for five Atari games, with
error rates as 10% and 30% respectively.Comment: AAAI 2020 (Spotlight
On-Device Domain Generalization
We present a systematic study of domain generalization (DG) for tiny neural
networks. This problem is critical to on-device machine learning applications
but has been overlooked in the literature where research has been merely
focused on large models. Tiny neural networks have much fewer parameters and
lower complexity and therefore should not be trained the same way as their
large counterparts for DG applications. By conducting extensive experiments, we
find that knowledge distillation (KD), a well-known technique for model
compression, is much better for tackling the on-device DG problem than
conventional DG methods. Another interesting observation is that the
teacher-student gap on out-of-distribution data is bigger than that on
in-distribution data, which highlights the capacity mismatch issue as well as
the shortcoming of KD. We further propose a method called out-of-distribution
knowledge distillation (OKD) where the idea is to teach the student how the
teacher handles out-of-distribution data synthesized via disruptive data
augmentation. Without adding any extra parameter to the model -- hence keeping
the deployment cost unchanged -- OKD significantly improves DG performance for
tiny neural networks in a variety of on-device DG scenarios for image and
speech applications. We also contribute a scalable approach for synthesizing
visual domain shifts, along with a new suite of DG datasets to complement
existing testbeds.Comment: Preprin
Panoptic Scene Graph Generation
Existing research addresses scene graph generation (SGG) -- a critical
technology for scene understanding in images -- from a detection perspective,
i.e., objects are detected using bounding boxes followed by prediction of their
pairwise relationships. We argue that such a paradigm causes several problems
that impede the progress of the field. For instance, bounding box-based labels
in current datasets usually contain redundant classes like hairs, and leave out
background information that is crucial to the understanding of context. In this
work, we introduce panoptic scene graph generation (PSG), a new problem task
that requires the model to generate a more comprehensive scene graph
representation based on panoptic segmentations rather than rigid bounding
boxes. A high-quality PSG dataset, which contains 49k well-annotated
overlapping images from COCO and Visual Genome, is created for the community to
keep track of its progress. For benchmarking, we build four two-stage
baselines, which are modified from classic methods in SGG, and two one-stage
baselines called PSGTR and PSGFormer, which are based on the efficient
Transformer-based detector, i.e., DETR. While PSGTR uses a set of queries to
directly learn triplets, PSGFormer separately models the objects and relations
in the form of queries from two Transformer decoders, followed by a
prompting-like relation-object matching mechanism. In the end, we share
insights on open challenges and future directions.Comment: Accepted to ECCV'22 (Paper ID #222, Final Score 2222). Project Page:
https://psgdataset.org/. OpenPSG Codebase:
https://github.com/Jingkang50/OpenPS
A proactive controller for human-driven robots based on force/motion observer mechanisms
This article investigates human-driven robots via physical interaction, which is enhanced by integrating the human partner's motion intention. A human motor control model is employed to estimate the human partner's motion intention. A system observer is developed to estimate the human's control input in this model, so that force sensing is not required. A robot controller is developed to incorporate the estimated human's motion intention, which makes the robot proactively follow the human partner's movements. Simulations and experiments on a physical robot are carried out to demonstrate the properties of our proposed controller
CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation
Realistic simulation is key to enabling safe and scalable development of %
self-driving vehicles. A core component is simulating the sensors so that the
entire autonomy system can be tested in simulation. Sensor simulation involves
modeling traffic participants, such as vehicles, with high quality appearance
and articulated geometry, and rendering them in real time. The self-driving
industry has typically employed artists to build these assets. However, this is
expensive, slow, and may not reflect reality. Instead, reconstructing assets
automatically from sensor data collected in the wild would provide a better
path to generating a diverse and large set with good real-world coverage.
Nevertheless, current reconstruction approaches struggle on in-the-wild sensor
data, due to its sparsity and noise. To tackle these issues, we present CADSim,
which combines part-aware object-class priors via a small set of CAD models
with differentiable rendering to automatically reconstruct vehicle geometry,
including articulated wheels, with high-quality appearance. Our experiments
show our method recovers more accurate shapes from sparse data compared to
existing approaches. Importantly, it also trains and renders efficiently. We
demonstrate our reconstructed vehicles in several applications, including
accurate testing of autonomy perception systems.Comment: CoRL 2022. Project page: https://waabi.ai/cadsim
Large Language Models are Visual Reasoning Coordinators
Visual reasoning requires multimodal perception and commonsense cognition of
the world. Recently, multiple vision-language models (VLMs) have been proposed
with excellent commonsense reasoning ability in various domains. However, how
to harness the collective power of these complementary VLMs is rarely explored.
Existing methods like ensemble still struggle to aggregate these models with
the desired higher-order communications. In this work, we propose Cola, a novel
paradigm that coordinates multiple VLMs for visual reasoning. Our key insight
is that a large language model (LLM) can efficiently coordinate multiple VLMs
by facilitating natural language communication that leverages their distinct
and complementary capabilities. Extensive experiments demonstrate that our
instruction tuning variant, Cola-FT, achieves state-of-the-art performance on
visual question answering (VQA), outside knowledge VQA, visual entailment, and
visual spatial reasoning tasks. Moreover, we show that our in-context learning
variant, Cola-Zero, exhibits competitive performance in zero and few-shot
settings, without finetuning. Through systematic ablation studies and
visualizations, we validate that a coordinator LLM indeed comprehends the
instruction prompts as well as the separate functionalities of VLMs; it then
coordinates them to enable impressive visual reasoning capabilities.Comment: Accepted at NeurIPS 202
Recommended from our members
Spatial iterative learning control for robotic path learning
A spatial iterative learning control (sILC) method is proposed for a robot to learn a desired path in an unknown environment. When interacting with the environment, the robot initially starts with a predefined trajectory so an interaction force is generated. By assuming that the environment is subjected to fixed spatial constraints, a learning law is proposed to update the robot's reference trajectory so that a desired interaction force is achieved. Different from existing iterative learning control methods in the literature, this method does not require repeating the interaction with the environment in time, which relaxes the assumption of the environment and thus addresses the limits of the existing methods. With the rigorous convergence analysis, simulation and experimental results in two applications of surface exploration and teaching by demonstration illustrate the significance and feasibility of the proposed method
- …